Smiley face

Hunt-Mcilroy Algorithm(Unix-diff, 1976)


Main features
Description

The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other. Diff has been engineered to make efficient use of time and space on typical inputs that arise in vetting version-to-version changes in computer-maintained or computer-generated documents. Time and space usage are observed to vary about as the sum of the file lengths on real data, although they are known to vary as the product of the file lengths in the worst case. The central algorithm of diff solves the longest common subsequence problem’ to find the lines that do not change between files. Practical efficiency is gained by attending only to certain critical ‘candidate’ matches between the files, the breaking of which would shorten the longest subsequence common to some pair of initial segments of the two files. Various techniques of hashing, presorting into equivalence classes, merging by binary search, and dynamic storage allocation are used to obtain good performance.

C source code
int Diff_Alg(char *stringA, char *stringB, int m, int n)
{
   int i, j, rowMax = 0;
   kCandidateList *kList;

   kList = (kCandidateList*) calloc(min(m, n) + 1, sizeof(kCandidateList));

   // Find Match
   for (i = 1; i <= m; i++){
      for (j = n; j >= 1; j--){
         if (stringA[i] == stringB[j]){
            rowMax = HMAlg(kList, j, i, rowMax);
         }
      }
   }

   return rowMax;
}
The files
  main.c   lcslib.h   Diff_Alg.exe

Reference
J. W. Hunt and M. D. McIlroy, "An Algorithm for Differential File Comparison," Computing Science Technical Report 41, Bell Laboratories (1975).